Realistic Image Generation from Text by Using BERT-Based Embedding
نویسندگان
چکیده
Recently, in the field of artificial intelligence, multimodal learning has received a lot attention due to expectations for enhancement AI performance and potential applications. Text-to-image generation, which is one tasks, challenging topic computer vision natural language processing. The text-to-image generation model based on generative adversarial network (GAN) utilizes text encoder pre-trained with image-text pairs. However, encoders pairs cannot obtain rich information about texts not seen during pre-training, thus it hard generate an image that semantically matches given description. In this paper, we propose new using BERT, widely used BERT as by performing fine-tuning large amount text, so obtained suitable task. Through experiments benchmark dataset, show proposed method improves over baseline both quantitatively qualitatively.
منابع مشابه
Conditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملGroundtruth Image Generation from Electronic Text (Demonstration)
The problem of generating synthetic data for the training and evaluating of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed an approach that uses langua...
متن کاملRealistic Image Generation for Testing Vision-based Autonomous Rendezvous
This paper describes the development of a tool for generating realistic synthetic images of planetary rovers and planet surfaces for the purpose of testing vision-based autonomy algorithms. Such algorithms have been used on the NASA Mars rovers and will be used heavily on ExoMars for navigation. Computer simulation is a useful complement to testing in artificial physical test beds and natural t...
متن کاملDual-Path Convolutional Image-Text Embedding
This paper considers the task of matching images and sentences. The challenge consists in discriminatively embedding the two modalities onto a shared visual-textual space. Existing work in this field largely uses Recurrent Neural Networks (RNN) for text feature learning and employs off-the-shelf Convolutional Neural Networks (CNN) for image feature extraction. Our system, in comparison, differs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2022
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics11050764